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ABSTRACT. We consider first the mixed discrete-continuous 
scheme of observation in multi-state models; this is a classical pat- 
tern in epidemiology because very often clinical status is assessed 
at discrete visit times while times of death or other events are 
observed exactly. A heuristic likelihood can be written for such 
models, at least for Markov models; however a formal proof is not 
easy and has not been given yet. We present a general class of 
possibly non-Markov multi-state models which can be represented 
naturally as multivariate counting processes. We give a rigorous 
derivation of the likelihood based on applying Jacod's formula for 
the full likelihood and taking conditional expectation for the ob- 
served likelihood. A local description of the likelihood allows us to 
extend the result to a more general coarsening observation scheme 
proposed by Commenges &; Gegout-Petit (2005). The approach is 
illustrated by considering models for dementia, institutionalization 
and death. 
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1 Introduction 



Multi-state models have been proposed for a long time, in particular in bio- 
logical applications. Until the seventies however the attention was essentially 
focused on homogeneous Markov models. Non-homogeneous Markov mod- 
els were studied by Fleming (1978), Aalcn & Johansen (1978); semi-Markov 
models were also considered by Lagakos, Sommer & Zelen (1978); the most 
studied model was the illness-death model (Andersen, 1988; Keiding, 1991). 
A thorough account of these developments can be found in Andersen et al. 
(1993) where the counting process theory is used to obtain rigorous results 
in this field; sec also Hougaard (2000). 

Another stream of research was started by Pcto (1973) & Turnbull (1976) 
who tackled the problem of interval-censored observations in survival data 
analysis and gave the non-parametric maximum likelihood estimator; Fryd- 
man (1995a, 1995b) extended this issue to the illness-death model in which 
transition toward illness could be interval-censored while time of death was 
exactly observed, while Wong (1999) studied a general case of multivariate 
interval-censored data. The penalized likelihood approach to this problem 
was proposed by Joly & Commenges (1999) with an application to AIDS, 
and by Joly et al. (2002) and Commenges et al. (2004) with apphcation to 
Alzheimer's disease. 

In the continuous time observation scheme it has been shown (Aalcn, 
1978; Borgan, 1984; Andersen et al, 1993) that the likelihood could be de- 
rived from Jacod's formula (Jacod, 1975); also in this case the martingale 
theory yields simple and natural estimators (Andersen et al., 1993; Aalen 
et al., 2004). In the case where some transitions are observed in discrete 
time while others are observed in continuous time, that we call the Mixed 
Discrete-Continuous Observation (MDCO) scheme, the natural martingale 
estimators are no longer available, so one has to return to hkelihood-based 
methods. The above cited papers considering this case used heuristic hkeh- 
hoods. Commenges (2003) derived the likelihood for this observation scheme 
in a Markov illness-death model. The main aim of this paper is to rigorously 
derive the likelihood in a more general framework: i) we consider a general 
class of multi-state models which may have any number of states and are not 
necessarily Markov; ii) we first consider the MDCO scheme and we extend to 
the so-called GCMP (General Coarsening Model for Processes) proposed by 
Commenges & Gegout-Petit (2005). In most of this work we assume that the 
mechanism leading to incomplete data is ignorable (Gill et al., 1997); however 
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we prove ignorability in the special case when death produces a stochastic 
censoring of the other processes of the model. Our approach starts by remark- 
ing that most of the useful multi-state models can be directly formulated in 
terms of multivariate counting process in a very natural way, an idea close to 
the "composable processes" studied by Schweder (1970). Then Jacod's for- 
mula can be applied to find the likelihood for continuous time observation. 
In the case where one or several of these processes is observed in discrete 
time the likelihood can be computed by taking a conditional expectation of 
the full likelihood. 

In section 2 the heuristic likehhoods for diverse observation schemes are 
recalled. Section 3 develops a natural correspondence between the class of 
irreversible multi-state processes and multivariate one-jump counting pro- 
cesses. In section 4 the multivariate counting process representation is ex- 
ploited to use Jacod's formula for finding the likelihood in the mixed discrete- 
continuous observation scheme; then the result is extended to the GCMP 
scheme. This general modehng approach is illustrated in section 5 for de- 
scribing a joint model for dementia, institutionalization and death, presented 
as a five-state model in Commenges & Joly (2004) and showing the benefits 
of the proposed approach to this case. Section 6 briefiy concludes. 

2 Heuristic likelihoods for multi-state models 
2.1 Notation 

A multi-state process X = (Xj) is a right-continuous process which can 
take a finite number of values {0,1,..., i^T — 1}. The theory of Markov 
multi-state models (or Markov chain models) is well established. The law 
of a Markov multi-state process is defined by the transition probabilities 
between states h and j that we will denote by Phj{s, t) = P{Xt = j\Xs = h); 
transition intensities ahj{t), for h ^ j, may be defined when the phj{s,t)^s 
are continuous both in s and in t for all s and t, as the following limits (if 
they exist): 

]imAt-.oPhj{t,t + At)/At. (1) 

It is reasonable in most applications in epidemiology to think that these lim- 
its exist; it is even reasonable to expect continuous and smooth transition 
intensities. We define auhif) = —J2jj^h'^hj(t) and J^j^th'^hjit) is the haz- 
ard function associated with the distribution of the sojourn time in state h. 
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When the ahj{tys do not depend on t, the Markov chain is said to be homo- 
geneous. Transition probabiUties and transition intensities are Unked by the 
forward Kolmogorov differential equations, so that solving these equations 
transition probabilities can be expressed as a function of transition intensi- 
ties; this solution can take the form of the product integral (see Andersen et 
al, 1993). For non-Markov multi-state models one could define analogously 
transition probabilities Phj{s,t) = P{Xt = = h^J^g-)-, where J^s_ is the 
history before s; similarly, transition intensities ahj{t; J-'t-) could be defined. 
However we are on a less firm ground because, contrarily to the Markov case, 
these quantities are random and the Kolmogorov equations have been given 
only for Markov processes. 

In the remaining of this paper we shall consider observations for differ- 
ent schemes of observation, leading to incomplete data. We assume that 
the mechanism leading to incomplete data is ignorable: this means that we 
make a correct inference by using the likelihood as if these schemes were de- 
terministic. In particular we can consider that the different times like C and 
Vi,l = 0, . . . ,m involved in this mechanisms, and which are defined below, 
arc fixed. This raises however a problem which will be discussed and solved 
in section 4.5. 

2.2 Likelihood for continuous time observations in Markov 
models 

Consider the case where process X is continuously observed from Vq to C. We 
observe that transitions have occurred at (exactly) times T(i) < T(2) < . . . < 
T(M)- With the convention T(o) = vo, the value of the likehhood, conditional 
on X^Q, on the event {M — m} and {X^^^ — Xr,r — 0, . . . , m} is: 

m 

^ ~ [H Px^_l,Xr_l{^(r-l).r(r)-)Q;a,^_l,Xr-(^M)]Px^,x^{^(m)>C')- 

r=\ 

The probability that no transition happens between T(r_i) and T^^)— given 
^Tf^r-v) ~ '^r-\ can easily be computed as 

Px,_i,x,_i(T(.-i),Tm-) = exp / a,^_^^,^_\u)du. 
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2.3 Likelihood for discrete time observations in Markov 
models 

Consider now the case where X is observed at discrete times Vo,Vi, . . . jV^. 
In this case, we observe a vector of random variables {X^^, . . . , X^^) and it 
is easy to derive the value of the likelihood, conditional on , on the event 
{Xy^ = Xr,r ^ 0, . . . ,m}: 

1 

>C = n Pxr,Xr+i{Vr,Vr+l). 
r=0 

2.4 Likelihood for mixed discrete-continuous time ob- 
servations in Markov models 

The most common case in applications is that some transitions are observed 
in discrete times and others in continuous time. A classical example is the 
irreversible illness-death model, a model with the three states "health" , "ill- 
ness", "death" respectively labeled 0, 1, 2; it is often the case that transition 
toward the illness state is observed in discrete time while transitions toward 
death is observed in continuous time (Prydman, 1995a; Joly et al., 2002). Let 
us call T the follow-up time that is T = min(T, C). where T is the time of 
death; we observe T and 5 = 1{t<c}- If the subject starts in state "health", 
has never been observed in the "illness" state and was last seen at visit M 
(at time vm) the likehhood, conditional on X^^ = 0, on the event {M — m} 
is: 

>C = P00{V0: Vm)\poo{Vm: f)ao2{fY+Poi{Vm: f)ai2{f)% (2) 

if the subject has been observed in the illness state for the first time at vi 
then the likelihood is: 

C = poo{vo,vi-i)poi{vi-i,vi)pu{vi,f)ai2{fY. (3) 

These formulae can be extended rather easily to models with more than 
three states in the case where there is one absorbing state, and transitions 
toward the absorbing state are observed in continuous time while transitions 
toward other states are observed in discrete time (Commenges, 2002). In the 
case where transitions toward one state are observed in discrete time while 
transitions toward other states are observed in continuous time the formula 
becomes cumbersome: see the example of the Dementia-Institution-Death 
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model of Commenges & Joly (2004). These formulae are heuristic. The 
same type of formulae can be written for non-Markov model using random 
versions of the transition probabilities and intensities and hoping to be able 
to compute transition probabilities in term of transition intensities; see Joly 
& Commenges (1999) for an example of a semi-Markov model. 

3 Multi-State models as multivariate count- 
ing processes 

The main motivation for representing multi-state models as multivariate 
counting processes is the availability of a formula giving the likelihood ratio 
for such processes (not necessarily Markov) observed in continuous time on 
[0, C] (Jacod, 1975). It will be shown in the next section that this can be the 
basis of a rigorous derivation of the likehhood also in the MDCO and in the 
more general GCMP schemes. It was shown (Borgan, 1984; Andersen et al., 
1993) that a multi-state model (X, a), where a = {ahj{.)] < h, j < K — 1), 
can be represented by the multivariate counting process (A^, A) where = 
{Nhj, h^j,0<h,j<K — 1) and the Nh/s count the number of transitions 
from state h to state j; the intensity of A^ is A = (A/ij, h ^ < h, j < K — 1) 
and Xhj = Yh{t)ahj{t), where Yh{t) = l{Xt-=h}- 

For irreversible multi-state models a more parsimonious representation 
is possible. It is often possible to formulate an epidemiological problem 
directly in term of counting processes rather than using the counting process 
representation as a mathematical device. Most multi-state models are in 
fact used for jointly modeling several events. For instance the illness-death 
model is used to jointly model onset of disease and death. So we can directly 
model the problem by considering a bivariate counting process A^ = (A^i, N2)-, 
where A^i counts the onset of the disease and A'2 counts the occurrence of 
death. It can be seen that the multi-state process can be retrieved by Xt = 
min(2A^2t+A^it, 2). The processes A^ and X generate the same filtration. Note 
however that for a fixed t, the random variables Xt and Nt do not generate the 
same a-field in general because the event {Xt = 2} (subject in the "death" 
state at t) is the same as {A^2t = 1}; however if we know Nt, we also know 
Nit, that is we know whether the subject passed through the state "illness" 
before t or not. Note that the representation based on the basic events 
of interest is more economical than the Borgan representation (a bivariate 
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rather than three- variate process). Similarly the Dementia-Institution-Death 
model, which is a five-state model (see section 5), can be represented by a 
three-variate process counting onset of dementia, institutionalization and 
death; in this model there are eight possible transitions so that the Borgan 
representation would entail a eight- variate process. 

Let us consider the more general problem of jointly modeling the onset 
of p types of events, each type occurring just once. This can be represented 
by a p-variate counting process N, each Nj making at most one jump. It 
is possible to construct a multi-state process W such that Wt generates the 
same (7-field as Nt. A possibihty is: Wt = Npt2P-'^ + Np^i^t'^P-^ + ■■■ Nu, 
that we denote by Wt — NptNp_i^t ■ ■ ■ -^it (this is the representation of Wt in 
base 2); Wt can take 2^ integer values in the set {0, 1, . . . , 2^ — 1}. Consider 
the important case where Np counts death; it is common in the multi-state 
representation to consider that deceased subjects are in the same state, that 
is we may construct a more compact multi-state model X defined by Xt — 
min(Wt, 2^"^); Xt can take 2^'^ + 1 values. For example the Dementia- 
Institution-Death model (where p = 3) has five states rather than eight. We 
have exactly the same number of non-zero transition intensities for W and 
X and they are equal; we simply have to rename them. More specifically 
we have a]^(.) = ahj{.) for < /i < j < 2^-^ and a^j{.) = a/i2P-i(.) for 
0<h< 2P-^ < j. 

Theorem 1 Let N = {Ni, . . . , Np) be a counting process with Njt < l;j = 
1, . . . ,p;t > andp > 1. Consider the multi-state process W = (Wt) defined 
by Wt — Npt . . . Nit in base 2. If, in a given probability measure, W is Markov 
with continuous transition intensities a/^(.); < h, j < 2^ — 1, the generating 
counting process Nj,j — 1 . . .p, have intensities given by: 

1 1 p 

Aj(t) = i{Njt_=o} X) • • • X) n iwt-=fe(}'^fep...fe,+iOfc,-i...fei,fcj,...fe,+iife,_i...fei(^);^ ^ o> 

A;i=0 kp=Ql=\ 

(4) 

where kp. . . kj^iOkj^i . . .ki and kp . . . kj+ilkj-i . . .ki are base 2 representa- 
tions of integers. 

Proof. Lemma 3.3 of Aalen (1978) gives an expression of the cadlag modifi- 
cation {Xj{t-\-)) of the caglad process {Xj{t)) by : 

A,(t+) = lnn^P[Nj^t+s)-Njt = l\J't] 
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lim^P[n,^,-{iV,(,+5) = Nu} n {iV,(,+,) - % = 1}|^,] 

l{^,,=o}limip[ni^,{iV,(,+5) = Nu} n {iV,(i+,) = 

lip ]^ 
lw.=o}lim E • • • E n MN,,=k,}TP[niMNiit+s) = Nu} n {iV,(i+5) = 

fci=0 fcp=Oi=l " 

lip I 

kNjt=o} E ••• E niwt=fcji™^^ 



1 1 p 

^{Njt=0} E ••• E n -'-{^!t=fc!}'^A:p...A:j+iOfcj_i...fci,fcp...fcj+iUj_i...fci(^); 



fci=0 

from which the theorem follows. 

A simple way of reading formula (jlj) is to say that on an event such that a 
jump of Nj implies a jump of W from h to h', then Xj{t) = It is easy 

to apply this theorem for finding the intensities of the Nj^s as a function of 
the transition intensities of X defined as above by first finding the intensities 
in term of the transition intensities of W and then renaming the transition 
intensities. 

Example: The illness- death model. Consider a bivariate counting process 
N = {Ni,N2), where Ni counts the onset of the disease and N2 counts the 
occurrence of death; thus we have p = 2. We can form the process (Wt) 
defined by Wt = N2tNit. This process has four states but states 2 and 3 have 
the same biological meaning ( "dead" ) so that they are generally grouped to 
obtain the conventional illness-death model [Xt) defined by Xt = mm{Wt, 2). 
If W is Markov, applying formula (jl]) yields for the intensities of Ni and N2 
respectively: 

Al(t) = l{Arit_=0}l{Ar2t-=0}aSl(^) 
A2(t) = l{Ar2t-=0}[l{Afit-=0}«02 W + l{Afit-=l}«13('^)]- 

In term of the transition intensities of X we obtain: 

Al(t) = l{Afit_=0}l{Af2t-=0}«0lW (5) 

A2(t) = l{JV2i_=o}[l{iVu-=o}ao2W + l{JVu_=i}ai2W]- (6) 

The same approach can be applied if some counting processes take more 
than two values even if the trick of the representation of the state value in 
base 2 does not work. A process with more than one jump can represent 
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progression of a disease: for instance the progressive model of Joly & Com- 
menges (1999) can be represented by a counting process making a jump at 
HIV infection and another jump at onset of AIDS. Alternatively we can rep- 
resent progression of the disease by two — 1 counting processes, one counting 
HIV infection, the other counting onset of AIDS, with the intensity of the 
latter being equal to zero if the former has not jumped. The 10-state model 
proposed by Alioum et al. (2005) can be represented by a process repre- 
senting progression of the HIV disease which can make two jumps and three 
— 1-processes counting HIV diagnosis, inclusion in a cohort and death or 
alternatively by five — 1-processes if we represent progression of the disease 
by the two — 1 counting processes as discussed above. A problem appears if 
we make the base-2 construction for such — 1 processes. The construction 
leads to "phantom" states which are not relevant, for instance the state AIDS 
without HIV infection; however we can still represent the relevant multi-state 
model by putting the transition leading to this state uniformly equal to zero. 
Theorem 1 can then still be apphed. 

We can formalize the relationship between the class of irreversible multi- 
state (IM) processes and one-jump counting (OJC) processes by saying that 
they are equivalent. This concept of equivalence between classes of processes 
is based on the following equivalence relation between processes. 

Definition 1 Two processes are informationally equivalent if they generate 
the same filtration. 

For each IM process we can find at least one informationally equivalent 
OJC process; this is obvious from the Borgan representation. Inversely for 
each OJC process we can find an informationally equivalent IM process (using 
the base 2 representation). We can also define canonical processes, that is 
simple representants of an equivalence class; this can be based on a notion 
of minimal representation: within the IM class the canonical process is the 
process with the smallest number of states; within the OJC class, this is 
the process of lowest dimension. For instance for the illness-death process, 
the canonical IM is the three-state process X (rather than the four-state 
process W) and the canonical OJC process is the bivariate "basic" counting 
process (rather than the three-dimensional process obtained form the Borgan 
representation). 

In the next section we will derive the likelihood for the IM-OJC class 
of processes from increasingly complex schemes of observation and show in 
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some examples that the rigorously derived likelihoods are the same as the 
heuristic ones. 



4 Derivation of the likelihood from Jacod's 
formula 

4.1 Likelihood for counting processes observed in con- 
tinuous time 

Prom now on we adopt a somewhat more rigorous probabilistic formalism. 
We assume a probability space (fi, T ^ P) is given and we consider a count- 
ing process defined on this space. Jacod (1975) has given the likelihood 
ratio for observation of a counting process N on [0, C], that is relative to 
the (T-field = V Afc where Afc = (j{Nju, < u < C;j = 1, . . . 
Aalen (1978), based on results of Jacod & Memin (1976), gave a simple form 
of the likelihood ratio in the case of absolutely continuous compensators by 
taking a reference probability under which the N/s are independent Poisson 
processes with intensity 1. Here we consider a multivariate counting process 
N with components Nj which are — 1 counting processes. For such pro- 
cesses it is more attractive to take a reference probability Pq under which the 
Nj^s are independent with intensities \^{t) = l{Njt-=o}'^ equivalently the T/s 
are independent with exponential distributions with unit parameter. The 
likelihood ratio for a probability Pg (with Pe « Pq) relative to Pq is: 

N^c P 

= [n Al(T(.))] eM-A%C)] n e^^-^, (7) 

r=l j=l 

where for each r G {1, . . . , Nq}, Jt is the unique j such that ANj^^^ = 1; 
N, = EU^Jt, A'{t) = E?=iA?(t), A%t) = J^\%u)du and A^^(t)''is the 
intensity of Nj under Pe. This formula allows us to compute the hkelihood 
for any multi-state model once we have written it as a multivariate counting 
process. Within the OJC class, we denote Tj the jump time of Nj] the 
likelihood can then be written (in a more manageable form for applications) 
in term of Tj — min(Tj, C) and Sj = 1{Tj<c} as: 

- [n my^] ^m-^%c)] n (s) 

3=1 j=l 
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The term appears when there is some information about the process 
of interest in the initial cr-field at time 0; from now on we suppose that it is not 
the case and so = 1 and this term disappears. We may have to compute 
conditional likelihoods, in particular likelihoods conditional on jF^y as we have 
seen it in section 2. The conditional likelihood is simply L^jr^^j:^ = C^r^/C^r^^ . 
In particular the likelihood on {Nj;^ = 0} is equal to exp[— A^(i;o) +pvo] 



UU \' (i)'^ I exp[-(A^(C) - A'Uvo))] UU ^ 



Tj-vo 



m 



so that >C^^|^^^^o 

this formula the the Tj are the jump times possibily observed after vq). 

It is interesting to make the link with the heuristic likelihood expressed in 
term of transition probabilities and intensities. Dropping the multiplicative 
factor 0^=1 e^^~^° (which does not depend on 6) and rearranging the product, 
the likelihood ([7]) can be written: 

N.c 

-^^0= [ne^P-[A.(7^w)-A.(7^(r-i))]Aj.(TM)] exp-[A.(C)-A.(r^.e)], 

r=l 

still with the convention T(o) = Vq. We note that the number of jumps called 
M in section 2.2 is precisely N_c', on {X^^^^ = Xr,r = l,...,m} we have 

^JriT(r)) = axr-i,xriT(r)) aud exp -[A,(T(^))-A.(T^-i)] = exp -[Jt^_^ Ej Xjiu)du] 
exp o:xr-ihiu)du\. The latter term is equal top,^_^,,^_^(T(,_i),r(,)-) 

so that we retrieve the expression given in 2.2. 

It must be noted that in general the A^(t)'s and A^(t)'s may depend on 
what has been observed before t; to be more explicit we can write A^(t; A 
t,l = l,...p) and A%C;fi,l = l,...p). 

The likelihood can be written as: 

z:^^=/sm,...,f,)ne^^ (9) 

where 

faisu ...,s,) = n[A?(3,; siAs,, 1 = 1.. . ,p)]^<=.<^> exp[-A:'(C; s,AC, / = 1 . . . ,p 
i=i 

is the part of the likelihood which depends on 6. Note that in this expression 
of the likelihood we have get rid of the thus simplifying the notation 
for the developments of the next sections; in this expression it is considered 
that if Tj = C the observation is right-censored; the case {Tj = C} does not 
make problem because this event has probability zero with our assumptions 
and the likelihood is defined almost everywhere. 
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4.2 Likelihood for the mixed discrete-continuous ob- 
servation scheme: the case when only one compo- 
nent is partially observed 

In this section, we shall compute the likelihood for the following scheme of 
observation of A^: Ni is observed at discrete times vq, . . . ,Vm while Nj,j = 
2, . . . ,p are observed in continuous time on [0, C]. The observation in this 
scheme is represented by the a-field O = a{Ni^^, I = 0, . . . ,m) \/ Qc where 
Gc = (^{Njuyj = 2, ...,p;0 < u < C). We obviously have O C J-'c so 
that the observed likelihood can be expressed as: Cq = Eo[£^^|C] where 
Eq means that we take expectation under Pq defined in the previous sub- 
section. We note from formula © that = f^{fi,T)g{T)e'^\ where T = 

(T2, . . . , Tp), ^?(r) = U%2 and /^(Ti, T) is a shortcut for fdf^^f^,..., fp); 
note that F is a ^c-measurable random variable. From independence between 
the Tj's under Pq, we have independence between (T(iVi„j, / = 1, . . . , m) and 
Qc and the computation of the conditional expectation can be done using 
the disintegration theorem (Kallenberg, 2001) which yields: 

where z/(.) is a regular version of the law of Ti given O which, by independence 
equals a version of the law of Ti given a{Niy^, 1 = 0,..., m). We decompose 
C^Q on atoms (see a definition in section 4.4) of a{Ni^^, / = 0, . . . , m) as : 

m 

= H l^i-i<7^i<^i}Eo[£^^|C] + l{Ti>t,,„}Eo[>C^^|C], (10) 
1=1 

with the convention vq = 0. Using the fact that z/((is) = -j^=^fzTZ^=iJi^ds gives 
the law of Ti given (Ti G (fj-i, vi]), we obtain for the first terms: 

-'-{t;;_i<Ti<-i;; ^{vi^i<Ti<vi yg{T)Eo[fc{Ti,T)\T, e]vi_,;vi]] 

= U^^-^<T.<n}jz;Sr^ [i fcis, T)ds. (11) 



We consider now the last term of (ITUl) . u{ds) = l(t,„,+oo]("S)e '^e'""'ds is a 
regular version of the law of Ti given (Ti > Vm) and we can write 

l{r,>„„}Eo[/:^JO] = l|T,>.™}^7(r)Eo[/S(Ti,F)|Ti>^;^] 
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lw<T.}^?(r)e''" 

im>.„}^(r)e''" 



pC r+oo 

/ fc{s,T)ds+ fc{C,T)e'^-'ds 



C 



fc{s,r)ds + fc{c,r) 



(12) 



Combining equations ffTOl) . f|TT]) and f|T2|) we have proved 



Lemma 1 For Ni observed at discrete times fo, . . . , fn 
observed in continuous time the likelihood is given by 



an 



dN„{2<3<p) 



1=1 



i<Ti<vi} 



+im>.„}^7(r)e^ 



/ fc{s,T)ds 
r fc{s,T)ds + fc{C,T) 



c 



(13) 



Example: The Markov illness-death model. Let us apply the formula to a 
Markov illness-death model; for brevity we shall compute the likelihood only 
on the event {vi-i < Ti < vi} H {T2 > C} (that is, in epidemiological 
language, when a subject has been first seen ill at vi and was still alive at C). 
On this event N^c = 1 and in this case: fc(Ti, F) = Ai(Ti)e~^''*^^-'. (here F = 
T2 and T2 = C on this event). We decompose the cumulative total intensity 
as: A%C) = A%vi^,)-A%vi_,) + A%T,)-A'iT,)+A%vi)-A'ivi)+A%C). 
We have in the Markov model (from formulae and (jH])) that A^(f;_i) = 
Ao.(0,i;,_i), A%T,)-A<^ivi.,) = Ao.(^^z-i, TO, A%vi) - A%T,) = A^iT^^vi) 
and A^(C) — A^{vi) = Ai2{vi,C), where Aij{a,b) = Jl^aij{s)ds. So that, 
taking out of the integral the terms which do not depend on s we obtain: 



fc{s,T)ds = e 



-Ao.{vi- 



-^)aoi(s)e-^- 



-A12 (fi,C) 



This is a product of three terms in which we recognize poo(0, vi-i), poi{vi^i, vi) 
andpii(f;,T) (Commenges et al, 2004). Conditioning on {N^f^ = 0} we have 
to divide by e""^'' '-"'"""^ and we retrieve formula Q. 



4.3 Likelihood for the mixed discrete-continuous ob- 
servation scheme: the case when several compo- 
nents are partially observed 

In this section, we shall first compute the likelihood for the following scheme 
of observation of A^: is observed at discrete times v^, . . . ,vl^ , and N2 
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is observed at discrete times t>Q, . . . , while Nj^j = 3, . . . ,p are observed 
in continuous time on [0, C] (see section 4.5 for the problem raised by the 
randomness of the number of observations). The observation in this scheme is 
represented by the a-field O = (t{Ni^i , A^2i)2 < Zi < mi, < /2 < V 
where Gc = (j{Nju,j = 3,...,p;0 < u < C). Using we write = 
/i^(Ti, T2, r^)(yf(r^)e"^i"^"^2, where is a ^^-measurable random variable. If 
we note O' = cr^Ni^i Ji = 0, ...,mi) V Qc (which we denoted O in the 

previous subsection), we obviously have O C O' C Tq so that the observed 
likelihood can be expressed as L^q = EQ[Eo[Cjr^\0']\0]. Lemma [T] gives an 
expression of Eo[£^^|C'] and using the same properties as above and the 



convention f , 



and Vq 



0, we can write : 



o 



E E -I^^ ^l{<_,<T2<.M^7(r')Eo 



+ 



^1=1 h=i e 

1712 

E l{<_i<Ta<.Mlm>.i„J^7(r')e^^i 

l2 = l 

Eo 

1. 



+ 



+ 



kriX,}l{T2><je^t;^,Eo 



c 



/ , fcisi, f2, r')ds, + fcic, f2, r')]\o 



We compute each term of the right hand of this equality using the law and the 
independence of the Tj's under Pq as in the previous subsection. Dropping 
the terms which not depend on 6 (as g{T'^)) we have the 

Lemma 2 For Ni observed at discrete times Vq, . . . , f^^, and N2 observed at 



^ "'^ and Nj, (3 < j < p) observed in continuous time 
the likelihood is given by 



discrete times Vq, . . . , v^^ 



o 



2^2^-^ =^7-;? 2 fcisuS2,T )dsids, 
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+ 



+ 






+ 



We can prove by induction a formula for the likelihood when k components of 
the processes are partially observed. The high number of possibilities for the 
realization of the (Tj)i<j<fc in the intervals [v].] v\._i\ or [t>^.; +oo[ makes the 
formula complicated. Lemma in the next sub-section makes it unnecessary 
to give such a cumbersome formula. 

4.4 General formula of the likelihood via its local rep- 
resentation 

Here we develop an approach which is closer to the statistician's point of 
view and which is more general. We begin with a Lemma. 

Lemma 3 Consider two observation schemes of N yielding observed a -fields 
O and O; consider an event A such that Po{A) > 0, A G CflO and AnO = 
A n d; then C% = Cj^ a.s. on A. 

Proof. Remember that £^ = Eo[/:^|C] and £^ = Eo[/:^|0]. Direct appli- 
cation of a lemma of local equality of conditional expectations (Kallenberg, 
2001: Lemma 2, Chapter 6) gives the result. 

For instance consider the case where A^^i is observed at discrete times 
f 0, . . . , fm and Nj,j > 1 are observed in continuous time as in section 4.1; 



take A = {Ti G ivi^i,vi]}.We have £^ = /^(s, r)c?s, a.s. 



on A (this can be seen by multiplying both sides of equation ( fT3l) by 1a 
which is equal to 1 on A). From a statistician's point of view we can say 
that if A happens the likelihood takes that particular form which does not 
depend on the other values of the observation times {vi', V ^ 1,1 — 1). So it is 
obvious either directly, or using Lemma [3] that if we consider another scheme 
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of observation where A^i is observed at Wi = f ;_i and W2 = vi leading to the 
hkehhood we have Cq = £^ a.s. on A. 

This leads to an extension of the field of application of the formulae of the 
likelihood for incomplete data. A more general scheme of observation can be 
considered: 

Definition 2 (The deterministic GCMP) A deterministic GCMP is a 
scheme of observation for a multivariate process X = {Xi, . . . , Xp) specified 
by a response function r(.) = . . . ,rp(.)), where the rj{.) 's take values 

or 1, such that Xjt is observed at time t if and only if rj{t) = 1, for 
J = l,•••,P■ 
This general (deterministic) scheme applied to allows each component 
Nj to be observed in continuous time over some windows and in discrete 
time over other windows. Within this scheme (assuming a family of equiv- 
alent probability measures), the likelihood is given by Eo[£^|(9] where O = 
a{rj{t)Njt] < t < C; j E {1, . . . ,p}). Lemma[3]will help us to give a simple 
expression of this likelihood if we can find a finite class of events {Ak) which 
form a partition of Q and on which the likelihood is relatively easy to com- 
pute. If all the observed events had a positive (and bounded away from zero) 
probability, the class of atoms of O would yield a natural finite partition of 
Q (a P-atom of a cr-field is a set A belonging to it, such that P{A) > and 
if B C A, then P{B) = or P{A n B) =0). For instance this would be 
the case if all the components of N were observed in discrete time. As soon 
as one component may be observed in continuous time, there may still exist 
atoms, but we do not have a partition of Q with atoms. This leads us to 
define a class of pseudo-atoms which is a partition of Q. For < j < p, we 
denote : 

- Vq = and {f j, . . . , vi^.} the finite set of times of discontinuities of the 
function rj{t) on [0, C] such that {vq = < vl < . . . < f^^.}. 

- if rrij > we define Dji = {Tj G {vj_^,vj]} fl {rj„ = 0,u E {vj_^,vj)}, 
I > 1. For some /, Dji is empty otherwise it is an atom of Oj = 
a{rj{t)Njt;0 <t<C). 

- Ej = {{Tj > f^j} n {vju = 0,u > f^j, }) U {Tj > C} which also is either 
empty or an atom of Oj. 
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- Cj = {Tj er~^(l)} where rj-^i^) denotes the interior (in topological 
sense) of the 3?-subset ^^"^(1) in [0, vl^.]. On Cj, the event Tj is exactly 
observed and Cj is the complementary set of [{lJi<i<=ooDji) U Ej] in Q. 

Q can be partitioned into the disjoint sets Dji,l > 1, Cj and Ej for each j 
(there is no such Dji if Nj is observed in continuous time over [0, C]; also Cj 
is empty if Nj is only observed at discrete times). A finer partition can be 
obtained by the intersection of these p partitions. 

Definition 3 (Pseudo-atoms) In the deterministic G CMP framework, we 
call pseudo-atoms of O , a set A — Cfj^iAj, where Aj — Dji for some I, 
Aj — Cj or Aj — Ej and such that Po{A) > 0. 

Remark. If A is a pseudo-atom of O we have A E O. The class of pseudo- 
atoms (Ak) form a partition oi Q: Q = UfcAy^; A^ fl Aj^i = ^ ii k ^ k'. 
Example: The Illness-death model with hybrid observation scheme for illness: 
pseudo-atoms. Consider an illness-death model in which illness represents a 
complication of a disease. The occurrence of the complication (illness) is 
observed in continuous time during a sojourn in hospital from time to 
time Vi. After the hospitalization, the complication is diagnosed at planned 
visits at times {v2, v^, ■ ■ ■ ,Vm < C}. We suppose that if death occurs during 
the study (i.e. between [0, C]) its time can be retrieved exactly. So the 

process of observation is given by < ^^^^} ~t ^{*='"i} With 

this scheme, the times of discontinuities and the sets defined previously are : 

- For the first component ri(.), mi = m and vl — Vk- For the second 
one, m2 = 0. 

- Dii is empty and for 2 < / < m, Du — {Ti e (vi-i,vi]}. If the event 
Dii occurs for some / > 2 it means that the complication is diagnosed 
for the first time at the visit vi. Since m2 = there is no D21. 

- El = {Ti > Vm} and E2 = {T2 > C} 

- Ci = {Ti < Vl}. If Ci occurs it means that the illness is diagnosed 
during the sojourn in hospital. We have C2 = {T2 < C}. 
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Note that Q = CiU (U^l^Du) U Ei gives a partition of f2 and f2 = C2 U E2 
gives an other one and the pseudo-atoms are the intersection of sets of these 
two partitions of positive probabihty. Now we can give the hkehhood by 
describing it on any pseudo-atom. 

Theorem 2 Consider the deterministic G CMP scheme specified byrj{.),j = 
l,...,p, and a pseudo-atom of O, A = n^gilT; e (fi,f2]} ^leL' {Ti' > 
■^m;/} ^leLnL' {Ti G rf^{l)} where L and L' are disjoint subsets of {1, . . .p}. 
On A the likelihood Cq is equal to that of a MDCO scheme where Ni is ob- 
served at times v{ and V2 for I E L, at "W^^,, for I' G L' and in continuous 
time for I" ^ L" = L H L' . Without loss of generality, we can suppose that 
L" = {k + 1, . . . ,p} . On A this likelihood is equal to: 



o 



]l'^{v[<si<vl,} n ^WL,<Si,<C} 

llczJl ' 



fcisi,...,sk,Tk) n 

leLUL' 



+ E/ E^{v[<s,<vi} n hv'' 



<si<C} 



fci^i^---^^^---^^k,'i^k)]]_dsi 



+ E 
+ ... 1 



niR_i<s,<<} n ^{vl^,<s,}\fcisi,.,^,.,^.,Sk,Tk)Y[ds, 



Proof. Let O the observed cr-field in the MDCO scheme of the Theorem. 
We have A e O, A e O and A f] O = A n O. Lemma [3] gives us £^ = £^ 
a.s on A. The value of the hkehhood on this event can be derived using the 
technique of the preceding sub-section. 

Example: The Illness-death model with hybrid observation scheme for illness: 
likelihood. The partition of fl given by the class of pseudo-atoms is {Ci fl 
C2; n C2, / = 2, . . . , m; El n C2; Ci n E2; Duf] E2,l = 2, . . . , m; E^ H 
E2}. In this case we have /^(si,S2) = A^(si; Si, S2 A Si)^*=i<'^> A2(s2; Si A 
S2, S2yi'^<''^exp[-A%C; si A C, 53 A C)]. The likelihood is easy to write on 
each of the pseudo-atoms. 

- On {Ci n C2} , all the components are observed in continuous time (L 
and L' are empty) : = jCjt^; 



- On DunC2, L = {1} and L" = {2} , C'^ = .^1 ' fj f^{su T2)dsu 

(e '^1 — e ' ) '—1 
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- OnE^n C2, C% = e"- [j^^ fci^i, T2)ds^ + f^C, T^)] We can remark 
that the intensity Af(t) (A2(t)) of occurrence of the illness (death) 
vanishes after the death and then the likelihood is equal to C% — 

[/2/^(si,T2)dsi + /^(T2,r2)]; 

- On Ci n E2 all the components are observed in continuous time on 

- On L>i,n^2, L = {1} and L" = {2}, C% = ' /j/ f^s,, C)ds,; 

(e 1 — e I ) 

- On n £;2, one finds C% = e(-'"+^) [/,^ f^{s,, C)ds, + fc{C, C) 

4.5 Ignorability of the stochastic censoring by death 

In many examples of interest in epidemiology, one of the processes, say Np 
counts death, and the observation of Ni,...,Np_i may be right censored 
by the time of death Tp (in addition to other types of coarsening). For 
instance suppose it has been planned to visit a subject at discrete visit times 
fQ, . . . , < C to observe the process A^i (representing an illness status), the 
last visit time is necesssarily random: it is vmi where Mi — max;(/ : vj < Tp). 
So for treating this problem we have to resort to the stochastic GCMP: this 
is the same definition as the deterministic GCMP except that the response 
process is stochastic and will be denoted by R. It is assumed that R is 
observed. In this problem we have to consider the cr-field JF = 7^ V Ac and 
the observed cr-field is C = cr{Rjt: Rjt^jt-, 0<t<C;j = l...,p). Consider 
the case where the event {R — r} has a positive probability. If the mechanism 
leading to incomplete data had been dctcrministically fixed to be equal to r, 
the observed cr-field would he O = a{rj{t)Njt, < t < C; j = 1 . . . ,p). The 
mechanism leading to incomplete data is ignorabble if on the event {R = r} 
using £^ leads to the same inference as using Cq. Commenges & Gegout- 
Petit (2005) gave general conditions of ignorabihty. 

Let us treat here the specific problem in which we assume that the stochas- 
tic nature of R comes only from the right censoring of the other processes by 
death. In that case the response processes can be written: 

Rjt = rj{t)l{Nj,t=o}, <t <C;j ^ l,...,p-l, 

and we assume that Rpt = 1,0 < t < C; the rj(t)'s are deterministic functions 
as in Definition 2. Define Mj — max;(/ : vj < Tp) and consider the event A — 
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ffrl{Mj = mj} for some choice of the m^'s compatible with the deterministic 
discontinuities f/'s; A has then a positive probabihty if the intensity of Np 
does not vanish on [0, C] . Since the Rj are functions of Np the observed cr-field 
can be written: O = cr(rj(t)l{7Vj,t=o}-^ji, j = 1 • • • ;P ~ 1; Npt, <t < C). For 
each A we can define determistic functions as: rj(t) = rj(t),t < Vm ',rj(t) = 
rj{t+),t > Vrh.;j = 1, . . . ,p — 1. For the deterministic GCMP specified by 
the rj(t)'s the observed a-field is O = a{fj{t)Njt,0 < t < C;j = 1 . . . ,p), 
where rp(t) = 1, < t < C . Because the stochastic right censoring depends 
only on Np which is observed, it is clear that A & O. Moreover because 
the intensities of the processes Nj,j = 1, . . . ,p — 1 are zero after death we 
have O = O on A. Thus we can apply Lemma [3] to find that on A we have 
£^ = Cq. The consequence is that in that case we can use the formulae 
derived in the deterministic framework (while it can not be deterministic), 
interpreting vl^_^ as the last discontinuity of rj (t) before death. 

4.6 Extension to general multi-state models 

Any multi-state process can be represented by an informationally equivalent 
counting process (obvious by the Borgan representation). The approach we 
have developed for deriving the likelihood in the GCMP context can be 
applied to general counting processes: that is compute the likelihood ratio for 
continuous-time observation and take the conditional expectation given O. 
However it seems nearly impossible to obtain a general formula in that case, 
in particular because the number of transitions which may occur between 
two observation times is not bounded. In applications this is of course not 
realistic. For instance consider a two-state reversible model for diarrhoea: 
state 0: no diarrhoea; state 1 : diarrhoea. By definition a period of diarrhoea 
lasts a certain time, say one day. In that case the number of transitions 
between two observation times is bounded. It is then possible to cast the 
problem in the OJC framework because a counting process making at most 
k jumps can be represented by a /c-dimensional OJC process. 
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5 Illustration on dementia-institution-death 
models 

5.1 Multi-State and counting process models 

A five-state non-homogeneous Markov model for dementia, institutionaliza- 
tion and death was proposed by Commenges & Joly (2004) (see Figure 1); 
we present this model with somewhat different notations to be in agree- 
ment with our general notations of section 4. The first aim was to esti- 
mate the eight transition intensities. It was proposed to make no para- 
metric assumption on Q;oi(^), ao2(^), ao4(^) but to relate parametrically the 
other transition intensities to these three basic intensities. Proportionality of 
the transition intensities toward dementia was assumed: a23{t) = aoi{t)e^^, 
as well as proportionality of the transition intensities toward institution: 
Cdsit) = aQ2(t)e^i. It was assumed that the transitions toward death are 
all proportional: Q;i4(t) = a;o4(t)e''i; a24{t) = Q;o4(t)e''2; a34{t) = Q;o4(t)e''i2. 
As shown in section 3 this model can be represented by a trivariate process 
= {Ni, N2, N^) where Ni counts dementia, A^2 counts institutionaliza- 
tion and A"3 counts death; the value of the five-state process X at t can 
be represented in base 2 as: Xt = min(A^3tA^2tA^it, 4). The processes X 
and A^ are equivalent in the sense that they generate the same filtration. 
Moreover if under a probability measure P, X is Markov with transition 
intensities {ahj,h = 0, ...,4;j = 0, ...,4) the intensity of A^ can be de- 
duced by the general formula (jlj). The model which was proposed by Com- 
menges & Joly (2004) is a Markov semi-parametric multiplicative model; we 
write the intensities as functions of Tj,j = 1,2,3 and we can verify that 
the intensities at t only depend on Tj At,j = 1,2,3 (because for instance 
{Tj >t} = {Tj At>t}); they are: 

X%t;T,,T2,n) = «oi(t)lm>t}l{r3>oe'^'^^<' 

A^(t;Ti,T2,T3) = ao2(t)lm>t}l{T3>*}e''''"^<' (14) 

A^(t;Ti,T2,T3) = ao4(t)l{T3>t}e''''^i<*+^^''^^<'+''-'-i<*'-^<% 

where 9 = {rj; aoj{.), j = 1,2,4) and r] represents the vector of parameters 
named with this letter. It is attractive to consider some non-Markovian 
models and, including an explanatory variable Zj. The model for subject i 
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may be: 





5.2 Likelihoods for coarsened observation from the dementia- 
institution-death model 

Dementia is commonly observed in discrete time (because it is diagnosed at 
planned visits), while time of death and institutionalization can be known 
exactly. A heuristic likelihood for this MDCO scheme of the semi-parametric 
Markov model was proposed in Commenges & Joly (2004). The results of 
the present paper allow us to rigorously derive the hkelihood, to obtain a 
more concise formula and this can also be done for non-Markov models. The 
model can be described as a trivariate counting process with A^i observed 
at discrete times Vq, . . . ,Vm and N2 and A^3 are observed in continuous time. 
We will consider that is the last visit really done {vm < T3) and thanks 
to the result of section 4.5 we will be able to treat Vm as deterministic. Thus 
we have r2{t) — rs{t) — 1,0 < t < C and we observe (for each subject) 
{Tj,5j),j = 2,3, using the standard notation of section 4.1; note that Sj = 
^Cpj = 2,3, with the event Cj (not to be confounded with time C) defined 
in section 4.4. As for Ni we observe the events Dn, I = 1 . . . ,m and Ei (or 
equivalently the value of Idk, ^ = 1 . . . ,m and l^J. Using the formula of 
Theorem 2, and dropping the multiplicative constant, we have for instance 
on the pseudo-atom Du fl C2 fl C3: 



For a more compact expression of the likelihood we may group the formulae 
for the pseudo-atoms included in Du by making use of the S/s and the 3}; 
the likelihood on Du — {Ti e {vi-i,vi\} is: 




L 



A?(si; si, T2, T3)A^(T2; si, T2, T3)A^(T3; s,, T2, n)e-''''^^^-''''^''^-'Us,. 




CVl 



A?(si; si, T2, T3)A^(T2; si, r2, ^3)'^ A^(T3; si, T2, f^f^e-'^'^'^'-^'^'^^^^'^'^ds^^ 

(15) 
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Similarly we obtain on Ei = {Ti > Vm}'- 

+ A^(T2;T3,T2,T3)''^A^(T3;T3,T2,T3)''^e-^*(^^'^=''^^'^^), (16) 

where A^(i; s, v, w) = Z]j=i Aj(^; w), and A^(t; s, f , w) = /q Aj(m; s, w)du, 
j = 1, 2, 3. In this formula we have replaced the upper bound C of the inte- 
gral by T3 because A^(si; si,T2,T3) = 0,T3 < si < C. For the same reason 
we have replaced C in A^(.; ., ., ., .) by T3. 

For the Markov model ( |T4l) for instance the intensities appearing inside 
the integrals of ( |T5i) and ( |T6l) can be computed to be: 

A?(si; si, T2, T3) = aoi(si)e''^i{*-2<n} 
A^(f2;si,f2,f3) =ao2(T2)e''?^W<T.} 
A^(f3;si,f2,T3) = ao4(T3)e'''+''2^('^2<^3}+''?2i{f2<f3}_ 

Note that the last equality is available only on the set {si < T3} (which is the 
case inside the integrals) and note moreover that on this event A3(T3; si, T2, T3) 
does not depend on si: this comes from the Markov property of this model. 
The other quantities appearing in (fTSll and (fT6l) can be computed by the 
same mechanical manipulations. The general formula can be applied even if 
the model is non-Markovian. 

A closer examination of the data available in the PAQUID study has re- 
vealed that the observations were more incomplete that what was assumed in 
Commenges & Joly (2004). When a subject is visited at time vi it is observed 
whether he lives in institution or not and the time of entry in institution is 
retrospectively recorded; information about institutionalization between the 
last visit and T3 is often unknown for subjects who where not institutional- 
ized at the last visit. The response processes are: Ru = ri(t)l^i\f-it=o} where 
ri{t) = 1 for t = vi , j = 1, . . . ,mi and zero elsewhere. We define M as 
M = maXi(Z : vi < T3), so that vm is the last visit time before C or death; 
R2t = l{t<i)Af} Rst = 1 for < t < C. It is seen that i?2t depends on 
the process A^3, and moreover, because of the retrospective recollection of 
the time of entry in institution, it depends on values of A^3 for times larger 
than t (future values !): so it is not obvious that the mechanism leading to 
incomplete data is ignorable. The result of section 4.5 does not apply directly 
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to this case but we can apply the same approach to prove that ignorabihty 
holds in this case. So we can still apply the general formula of Theorem 2. 

The observed variables for dementia and death before, but for 

institution we observe C2, lc2^2, E2. On C2 fl Du (that is we observe that 
the subject was institutionalized at T2 and demented between vi-i and vi) 
the likelihood is given by ffT^ : on C2 H Ei (the subject was institutionalized 
at T2 and not demented at the last visit) it is given by flTBl) . Two different 
formulae are necessary to describe the likelihood on E2. As before the formula 
of Theorem 2 gives us the likelihoood on pseudo-atoms and we may group 
the formulae for the pseudo-atoms included in E2 fl Du to obtain: 



= r \ [ ' >^lisi;s,,S2,fs)Xlis2;s,,S2,n)Xlin;s^,S2,nY^^ 



and those incuded in E2 fl Ei to obtain 



JvM JvM 

+ r A?(.i; su T3, T3)A^(T3; s„ T3, fst^e-^'^^^-'^-^^'^^Us, 

JvM 

r A^(.2; T3, S2, f,)Xl{f,; T3, S2, T3)^3e-A''(T3;f3,s„f3)^^^ 



Once the likelihood is computed different approaches for inference are 
possible; in particular it was proposed in Commenges & Joly (2004) to use 
penalized likelihood, with smoothing coefficients chosen by cross-validation 
and this method gave very satisfactory results in a simulation study. 



6 Conclusion 

Many multi-state models can be considered as generated by simple events 
so that a direct representation in terms of counting processes may be more 
economical; this is particularly the case when the events are not repeated and 
can be modeled by — 1 counting processes. The multi-state point of view 
however retains its interest in many applications, in particular for reversible 
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models: for instance epileptic crises, repeated diarrhoea periods or repeated 
hospitalization stays might be modeled by a reversible two-state model if 
we wish to take into account the duration of the crises or of the hospital 
stays. So the multi-state and the counting process points of view are rather 
complementary. 

Representing multi-state models as counting processes models allows us 
to rigorously derive the likelihood by the use of Jacod's formula. This was 
already known for continuous time observation schemes but we were able to 
apply this approach to the quite general GCMP scheme. This theory will 
be useful for developing complex models in life history events analysis. In 
addition to that, having a compact general formula for the likelihood could 
be exploited, for instance, for designing a software able to automatically treat 
any IM model: the user could specify his model by a routine giving the the 
values of the intensities of the OJC model which would be used to compute 
the likehhood corresponding to the observations described in the data set. 
This would be in particular feasible for parametric or penalized likelihood 
approaches; see Commenges et al. (2006) for a penalized likelihood approach 
of Markov and semi-Markov models. 
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Figure 1: The five-state model for dementia, institutionalization and death. 
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